-
Notifications
You must be signed in to change notification settings - Fork 227
Fix platform evaluation on macos CI #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The run.sh script used the $OSTYPE environment variable to determine whether the host system is Apple. This environment variable is available in bash, so might not be a guarantee - and is potentially why CI is failing on Mac in this branch. Using uname -s is likely a more robust check locally and in CI.
|
$OSTYPE was likely a red-herring and not the reason that the OS check was failing to evaluate. Needs a little more debugging. |
Replace use of system_profiler for identification with another call to uname. It may be that SPDisplaysDataType does not function as expected on the virtualised github mac runners. Also we are properly ensuring the host is an arm mac this way.
|
Now properly appears to identify the mac system and metal support. |
|
Interesting, is this an issue with our stack or is it on Github's side? |
|
Fwiw the gpu-spec script is just a utility script and whether it can find the gpu soec or not is orthogonal to making the ci work. |
|
All of the attempts show there's something wrong with github mac gpu runners. Locally is fine. |
|
My remaining theory is that the GH runners are missing some environment variable that we're somehow relying on, but I have no clue what that would be. Not familiar with our platform detection logic. |
This may well be true, but I wanted to make sure we could correctly see Metal capability on the host, and fixing the detection allows for us to implement a list of tests that can be skipped since they're not compatible - like the nvidia high compute list. However, the current errors from the CI say:
The missing value for architecture in that error makes me think that we're actually looking at a very similar issue during execution, but I haven't got to that yet :) |
For your interest - we were using |
|
Curious why we need these detection scripts in the first place. Does Mojo not do the right thing:tm: and just detect the GPUs (ignoring the Apple issue, but other GPUs specifically)? |
Some of this is for limiting the tests executed based on detected hardware capability, which I think makes sense especially given that this repo is being used by people who might well be using consumer hardware. But I think some is just legacy and can be optimised. |
|
The scripts is used initially for users to see specs of their gpu and in the test to detect what puzzles to skip esp. bc the later ones require H100+ and this CI is on T4 so it's not only cleaner but makes things less confusing if users don't have access to our tier 1 gpus. |
A temporary mechanism to collect some information about the hardware from system_profiler to see if anything reports unusually. I think these CI runners do something a little differently.
Trying a little in-line Swift to gather information about the GPU since it's clear from system_profiler that the CPU appears with a different designation thanks to being virtualised. Perhaps the GPU does too, and therefore is not recognised properly by info.mojo Using Swift since system_profiler SPDisplaysDataType doesn't seem to work for headless machines.
Fixed the capabilities being requested and moved the swift code to a temporary standalone script file to make things easier.
Adding to @ehsanmok and @Ahajha's recent work to try to set up Mac CI.
The run.sh script used the $OSTYPE environment variable to determine whether
the host system is Apple. This environment variable is available in bash, so
might not be a guarantee - and is potentially why CI is failing to evaluate the
host OS as Mac in this branch. Using uname -s is likely a more robust check
locally and in CI.